Model Selection

Object Detection and Segmentation

# Object Detection and Segmentation

Paligemma2 3b Mix 224 Jax

PaliGemma 2 is an upgraded vision-language model based on Gemma 2, supporting multilingual image-text input and text output, specifically designed for vision-language tasks

Paligemma2 28b Pt 896

PaliGemma 2 is a Vision-Language Model (VLM) launched by Google, combining the capabilities of the Gemma 2 language model and SigLIP vision model, supporting image and text inputs to generate text outputs.

Paligemma2 10b Pt 896

PaliGemma 2 is a Vision-Language Model (VLM) launched by Google, integrating Gemma 2 capabilities, supporting image and text input to generate text output

Paligemma2 10b Pt 448

PaliGemma 2 is Google's upgraded vision-language model (VLM) that combines Gemma 2 capabilities, supporting image and text input to generate text output.

Paligemma2 10b Pt 224

PaliGemma 2 is a vision-language model (VLM) that combines the capabilities of the Gemma 2 model. It can process both image and text inputs simultaneously and generate text outputs, supporting multiple languages. It is suitable for various vision-language tasks such as image and short video captioning, visual question answering, text reading, object detection, and object segmentation.

Paligemma2 3b Pt 896

PaliGemma 2 is a multimodal vision-language model that combines image and text inputs to generate text outputs. It supports multiple languages and is suitable for various vision-language tasks.

Paligemma2 10b Mix 224

PaliGemma 2 is a vision-language model based on Gemma 2, supporting image and text input to generate text output, suitable for various vision-language tasks.

Paligemma2 3b Mix 224

PaliGemma 2 is an upgraded vision-language model developed by Google, combining the capabilities of Gemma 2, supporting image and text inputs to generate text outputs, suitable for various vision-language tasks.

Florence 2 Large No Flash Attn

Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle diverse visual tasks through unified representation, enabling functions like image captioning and object detection.

Florence 2 Base Ft

Florence-2 is an advanced vision foundation model developed by Microsoft, employing a prompt-based approach to handle a wide range of vision and vision-language tasks.

Paligemma 3b Ft Widgetcap 224

PaliGemma is a multi-functional lightweight vision-language model that combines image and text inputs to generate text outputs. It supports multiple languages and performs excellently in various vision-language tasks.

Paligemma 3b Ft Scicap 448

PaliGemma is a multi-functional lightweight vision-language model that combines image and text inputs to generate text outputs and supports multiple languages.

Paligemma 3b Ft Cococap 224

PaliGemma is a multi-functional lightweight vision-language model (VLM) that supports multi-language input and output and is suitable for various vision-language tasks.

Paligemma 3b Ft Nlvr2 224

PaliGemma is a multi-functional lightweight vision-language model (VLM) that supports multilingual input and output and excels in various vision-language tasks such as image captioning and visual question answering.

Paligemma 3b Mix 448

PaliGemma is a versatile lightweight vision-language model (VLM) built upon the SigLIP vision model and Gemma language model, supporting image and text inputs to generate text outputs

Paligemma 3b Ft Nlvr2 448

PaliGemma is a versatile and lightweight vision-language model (VLM) that supports image and text input and generates text output, suitable for various vision-language tasks.

Paligemma 3b Ft Vqav2 224

PaliGemma is a multi-functional lightweight vision-language model that combines image and text inputs to generate text outputs and supports multiple languages.

Paligemma 3b Ft Docvqa 896

PaliGemma is a lightweight vision-language model developed by Google, built on the SigLIP vision model and the Gemma language model, supporting multilingual image-text understanding and generation.

Paligemma 3b Pt 224

PaliGemma is a versatile lightweight vision-language model (VLM) built upon SigLIP vision model and Gemma language model, capable of processing both image and text inputs to generate text outputs.

Paligemma 3b Ft Scicap 224

PaliGemma is a lightweight vision-language model that combines image and text inputs to generate text outputs, supporting multilingual and multi-task processing.

Paligemma 3b Ft Ocrvqa 896

PaliGemma is a multi-functional lightweight vision-language model that supports image and text input and generates text output, suitable for various vision-language tasks.

Paligemma 3b Ft Science Qa 224

PaliGemma is a multi-functional lightweight vision-language model (VLM) that supports image and text input and generates text output, suitable for various vision-language tasks.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase